Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(swarm): allow configuration to idle connection timeout #4161

Merged
merged 33 commits into from
Sep 19, 2023

Conversation

startup-dreamer
Copy link
Contributor

@startup-dreamer startup-dreamer commented Jul 8, 2023

Description

Previously, a connection would be shut down immediately as soon as its ConnectionHandler reports KeepAlive::No. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the keep_alive::Behaviour. That one does essentially nothing other than statically returning KeepAlive::Yes from its ConnectionHandler.

It makes much more sense to deprecate keep_alive::Behaviour and instead allow users to globally configure an idle_conncetion_timeout on the Swarm. This timeout comes into effect once a ConnectionHandler reports KeepAlive::No. To start with, this timeout is 0. Together with #3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes #4121.

Notes & open questions

am I in the right direction also we might add tests for this feature.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@startup-dreamer startup-dreamer changed the title added idle_connection_timeout for swarm builder feat(SwarmBuilder): added idle_connection_timeout for swarm builder Jul 8, 2023
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks like a good start!

I'd like us to use this directly in libp2p-swarm-test and see which tests we can remove keep_alive::Behaviour from, should be all of them but maybe something else turns up!

swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger changed the title feat(SwarmBuilder): added idle_connection_timeout for swarm builder feat(swarm): allow configuration of idle connection timeout Jul 9, 2023
@startup-dreamer startup-dreamer force-pushed the feat/idle_connection_timeout branch from a0a5d44 to 073cee1 Compare July 11, 2023 13:50
@startup-dreamer

This comment was marked as resolved.

@thomaseizinger
Copy link
Contributor

Hy @thomaseizinger can you guide me through how I am gonna remove keep_alive::Behaviour tests can't wrap my head around this.

Sure!

So the context is that with the current code, a connection will be immediately shut down as soon as a ConnectionHandler reports KeepAlive::No.

In many tests, we only use a single NetworkBehaviour which is quite an artificial scenario and as such, we often run into the problem that connections get shut down too quickly despite us still wanting to use them in the test.

With this new feature, we can delay that shutdown which in turn allows us to remove keep_alive::Behaviour from many tests.

I'd take the following approach:

  • Put a #[deprecated] attribute on keep_alive::Behaviour
  • Run cargo check / cargo clippy with --all-targets
  • Go through each usage that is a test and remove it. Most of the time, this will be as simple as deleting a field from a struct
  • Run the test. Likely, it will fail.
  • Check where the Swarm in this particular test gets constructed.
    • For the usages coming from Swarm::new_ephemeral, you can extend the SwarmBuilder in there to configure a 10 sec keep alive timeout. That should be more than plenty for each test. This will likely fix a lot of tests because we extensively use that test harness in our codebase.
    • For any other usage, simply extend that SwarmBuilder with a 10 second timeout.

Does that make sense? You can try yourself on one or two tests and then push a commit and we can look at it together :)

swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the help here.

swarm/src/connection.rs Outdated Show resolved Hide resolved
@startup-dreamer
Copy link
Contributor Author

Thank you for the help here.

I still wan't to work on this however I am busy in some other work so if anyone wanna take this he can.
If not I will complete this as soon as possible.

@thomaseizinger
Copy link
Contributor

Thank you for the help here.

I still wan't to work on this however I am busy in some other work so if anyone wanna take this he can.
If not I will complete this as soon as possible.

Thank you! No pressure from our end, your help is much appreciated :)

@thomaseizinger
Copy link
Contributor

@startup-dreamer I've pushed some commits that should help move this forward. In particular, I've added the deprecation attribute and converted the first test. Most of the others should be similar :)

@startup-dreamer
Copy link
Contributor Author

@startup-dreamer I've pushed some commits that should help move this forward. In particular, I've added the deprecation attribute and converted the first test. Most of the others should be similar :)

graet help sir will try to move forward from here.

@startup-dreamer

This comment was marked as outdated.

@mergify
Copy link
Contributor

mergify bot commented Aug 4, 2023

This pull request has merge conflicts. Could you please resolve them @startup-dreamer? 🙏

@startup-dreamer startup-dreamer changed the title feat(swarm): allow configuration of idle connection timeout feat(swarm): allow configuration to idle connection timeout Aug 5, 2023
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far :)

@thomaseizinger
Copy link
Contributor

@thomaseizinger should i wait for you to review each refactor I does or do a bunch of refactoring and then take the review

Looks good so far, you are on the right track :)

@startup-dreamer startup-dreamer force-pushed the feat/idle_connection_timeout branch from 27f2345 to f91d242 Compare August 6, 2023 08:26
@thomaseizinger
Copy link
Contributor

@mxinden I spent some cycles on this to push it over the finish line as there wasn't much missing. Please review!

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise this looks good to me.

Let's have this run on am6-rust (IPFS bootstrap node staging server) once merged into master before cutting a release. Will be deployed automatically via our nightly job once merged.

swarm/src/connection.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

Let's have this run on am6-rust (IPFS bootstrap node staging server) once merged into master before cutting a release. Will be deployed automatically via our nightly job once merged.

With or without setting a custom timeout?

@thomaseizinger thomaseizinger added this to the Simplify idle connection management milestone Sep 17, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 18, 2023

This pull request has merge conflicts. Could you please resolve them @startup-dreamer? 🙏

swarm/Cargo.toml Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented Sep 19, 2023

Let's have this run on am6-rust (IPFS bootstrap node staging server) once merged into master before cutting a release. Will be deployed automatically via our nightly job once merged.

With or without setting a custom timeout?

I suggest without, to validate that the status-quo doesn't change.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thank you for the tests!

@mxinden
Copy link
Member

mxinden commented Sep 19, 2023

@thomaseizinger can you address the CI failures? Otherwise this is ready to merge from my side.

@thomaseizinger
Copy link
Contributor

@thomaseizinger can you address the CI failures? Otherwise this is ready to merge from my side.

Done! I'll send it then :)

@mergify mergify bot merged commit c52a2fc into libp2p:master Sep 19, 2023
69 checks passed
thomaseizinger pushed a commit to dgarus/rust-libp2p that referenced this pull request Sep 20, 2023
Previously, a connection would be shut down immediately as soon as its `ConnectionHandler` reports `KeepAlive::No`. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the `keep_alive::Behaviour`. That one does essentially nothing other than statically returning `KeepAlive::Yes` from its `ConnectionHandler`.

It makes much more sense to deprecate `keep_alive::Behaviour` and instead allow users to globally configure an `idle_conncetion_timeout` on the `Swarm`. This timeout comes into effect once a `ConnectionHandler` reports `KeepAlive::No`. To start with, this timeout is 0. Together with libp2p#3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes libp2p#4121.

Pull-Request: libp2p#4161.
@startup-dreamer startup-dreamer deleted the feat/idle_connection_timeout branch September 20, 2023 13:53
thomaseizinger pushed a commit that referenced this pull request Sep 21, 2023
Previously, a connection would be shut down immediately as soon as its `ConnectionHandler` reports `KeepAlive::No`. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the `keep_alive::Behaviour`. That one does essentially nothing other than statically returning `KeepAlive::Yes` from its `ConnectionHandler`.

It makes much more sense to deprecate `keep_alive::Behaviour` and instead allow users to globally configure an `idle_conncetion_timeout` on the `Swarm`. This timeout comes into effect once a `ConnectionHandler` reports `KeepAlive::No`. To start with, this timeout is 0. Together with #3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes #4121.

Pull-Request: #4161.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swarm: Introduce an idle_connection_timeout config option
3 participants